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Abstract 

In this paper, we present a mathematical model to capture various factors which may influence the accuracy 
of a competitive group recommendation system. We apply this model to peer review systems, i.e., conference or 
research grants review, which is an essential component in our scientific community. We explore number of important 
questions, i.e., how will the number of reviews per paper affect the accuracy of the overall recommendation? Will 
the score aggregation policy influence the final recommendation? How reviewers' preference may affect the accuracy 
of the final recommendation? To answer these important questions, we formally analyze our model. Through this 
analysis, we obtain the insight on how to design a randomized algorithm which is both computationally efficient 
and asymptotically accurate in evaluating the accuracy of a competitive group recommendation system. We obtain 
number of interesting observations: i.e., for a medium tier conference, three reviews per paper is sufficient for a high 
accuracy recommendation. For prestigious conferences, one may need at least seven reviews per paper to achieve 
high accuracy. We also propose a heterogeneous review strategy which requires equal or less reviewing workload, 
but can improve over a homogeneous review strategy in recommendation accuracy by as much as 30% . We believe 
our models and methodology are important building blocks to study competitive group recommendation systems. 

1 Introduction 

In recent years, recommendation systems ||20) have received a lot of attention in both commercial and academic 
communities. Researchers investigate various algorithmic and complexity issues ir7l [T0ll20ll22ll23]| . at the same time, 
we also see successful applications of recommendation systems in commercial products. In general, recommendation 
systems take into account a user's preference and make a recommendation so as to maximize the user's utility. Group 
recommendation systems |[3l, on the other hand, take into account the preferences of all users in a group to make 
a single recommendation. In recent years, we have seen successful group recommendations in commercial areas 

UElIIIliniElIISllIsl. 

In this paper, we consider a special class of recommendation system which we call the competitive group recom- 
mendation system: there are users and the system will make a single recommendation to k users only, where k < N, 
while N ~k users will receive the complement of the recommendation. Competitive group recommendation systems 
have many important applications. In here, we consider an application which is dearest to many researchers' heart: 
peer review systems for conferences or research grant proposals. To a certain degree, the progress of our scientific 
community depends on the accuracy this type of recommendation systems. To the best of our knowledge, this is the 
first paper which provides a formal mathematical analysis to such recommendation systems. 

A peer review system can be briefly described as follows: there are N candidates (papers or grant proposals), a 
group of reviewers is asked to review these candidates. Each reviewer evaluates a subset of these candidates based on 
her preference, and will provide a rating for each candidate. The system will use some policies to aggregate all ratings 
of all candidates, and will only recommend a subset k candidates for acceptance, while all other candidates will receive 
a rejection, which is the complement of the acceptance recommendation. For such systems, there are many interesting 
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questions to explore, e.g., to achieve high accuracy, how many reviews each candidate should receive? What is the 
probability that the best candidate will be accepted or rejected? How reviewers' preference may influence the final 
recommendation? Is one rating aggregation policy more accurate than others? 

Our contribution can be summarized as foUows: 

• We propose a mathematical model to understand the accuracy of a competitive group recommendation system 
and apply it to conference review systems. 

• We formally analyze the model. Through this analysis, we gain the insight to create a randomized algorithm 
to evaluate the model. We show our algorithm is computationally efficient and also provides performance 
guarantees. 

• We apply our model to a conference review system and show many interesting insights, i.e., for a medium 
titer conference, three reviews per paper can guarantee a highly accurate recommendation, but for prestigious 
conferences, we need at least seven reviews per paper 

• We propose a two round heterogeneous review strategy which outperforms the homogeneous review strategy by 
as much as 30% in recommendation accuracy with the the same or less reviewing workload. 

This is the outline of the paper. In Section|2] we present the mathematical model of competitive group recommen- 
dation systems. In Section[3] we present analysis and derive theoretical results of the model. In Section|4] we propose 
a randomized algorithm which is computationally efficient and provides performance guarantees in evaluating a com- 
petitive group recommendation system. In Section |5] we evaluate the performance of a conference review system and 
explore various factors that influence its accuracy. Related work is given in Section|6]and Sectior|7]concludes. 

2 Mathematical Model 

Let us present the mathematical model of a competitive group recommendation system and we focus on a particular 
application scenario, a conference review system which is a representative example of peer review systems. Let 
V = {Pi, . . . , Pn} be a finite set of N candidate papers. Let Qi G (1, m) represent the intrinsic quality of paper Pi. 
Higher value of intrinsic quaUty implies higher quality. Hence, if Qi > Qj, it means paper Pi is better than Pj. Without 
any loss of generality, let us assume Qi> Q2> ■ ■ ■ > Qn- It is important to emphasize that reviewers of these papers 
do not have any a-prior knowledge of Qi, Vi. The conference can only accept k papers, where 1 < fc < N. Let (k) 
and A{k) denote the set of the k accepted papers according to the intrinsic quaUty or according to the conference 
recommendation criteria respectively. It is clear that A' (k) = {Pi, P2, • • • ,Pk}, and if a conference recommendation 
system is perfect, we should have A^{k) =A{k). But in general, many factors or reviewers' preference may influence 
the final recommendation, hence A^ (k) ^ A{k). To measure the accuracy of a recommendation system, we aim to 
determine how many papers in A{k) are also in A^ (k). Formally, we seek to derive the following probability mass 
function (pmf) : 

Pi[\A^ {k) n A{k)\ fori = 0,1,..., fc. 

Intuitively, if PT[\A^{k) n Aik)\ = k] occurs with a high probability, then the conference recommendation system is 
accurate and at the same time, robust against different scoring and human factors. 

Let 7?. be a finite set of M reviewers. We assume that the reviewers are independent. Reviewers do not have a 
direct knowledge of Qi, Vi, the intrinsic quality of papers, and they evaluate papers based on their own preference. 
A reviewer submits a score for each paper after reviewing. Scores are discrete and take on value in {1, . . . ,m}. 
Paper Pi, for i ~ 1, . . . , N, is assigned to > 1 reviewers. Hence, paper Pi receives ti; reviewing scores. Let 
^{Sl, . . . , 5^. } denote the set of rii scores of paper Pi, where S** e {1, . . . , m}. Let TZ{S'j) represent the reviewer 
who submits score Sj. Let e* £{1, . . . ,1} represent the expertise level (or familiarity) that reviewer TZ{Sj) selects on 
topics related to paper Pi. Reviewer Tl{Sj) submits score 5** in conjunction with expertise level e* . Again, we adopt 
the convention that higher values represent higher quality or expertise level. There are number of interesting questions 
one can explore, i.e., how M, the number of reviewers (or the size of a technical program committee), as well as jii , Vz, 
the number of reviews for each paper, may affect the accuracy of the final recommendation? 

Let V be the voting rule that is used by the conference recommendation system to rank papers based on their 
reviewing scores. Generally, a voting rule works in two steps. It first aggregates the reviewing scores of each paper into 
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a combined overall score. Then it ranks all papers based on their respective combined overall scores. Let 7^ = V(iS*) 
be the combined overall score of paper Pi derived from 5* under the voting rule V. There can be many voting rules. 
A simple and often used voting rule is the average score rule. In this case, we have % = J^ses^ By sorting 

71 , . . . , 77V, we obtain a ranked list of all papers. Again, there are number of interesting questions to explore, i.e., what 
are some effective voting rules? Can one voting rule be more accurate than others? 

Specifying the voting rule is not enough. Recall that the system can only accept k papers. It may happen that 
the combined overall score of the A:*'' ranked paper, equals to that of (A: + l)-th ranked paper In this case, we need 
to specify a tie-breaking rule to decide which paper should be selected. Let T denote the tie-breaking rule. It is 
interesting to explore whether the recommendation results are sensitive to a particular tie-breaking rule. 

To answer the above questions, let us now present probabilistic models in describing the intrinsic quality (or the 
self-selection effect), the reviewing behavior, as well as critical degree of reviewers. 

2.1 Model Intrinsic Quality via Self-selection 

It is well-known that paper submission has the self-selection effect. In other words, authors tend to submit their high 
quality papers to some highly prestigious and selective conferences, while lower tier conferences may receive papers 
with lower quality, or candidate papers have high variance in quality. The intrinsic quality of a submitted paper can be 
described as a random variable, and one can vary its mean or variance to reflect the self-selection effect. Specifically, 
a high value of mean and a small value of variance imply that the submitted papers are of high intrinsic qualities and 
these qualities have small variation only. On the other hand, a low value of mean and a large value of variance imply 
that the submitted papers have low intrinsic qualities and these qualities have high variability. 

We use Qi £ to denote the intrinsic quality of paper Pi. Assume Qi,...,Qn are independent random 

variables. Let 'D{Qi) denote the probability distribution of Qi. The probability distribution 'D{Qi) is described by a 
truncated normal distribution Af{qi, af), where qi e (1, m) is the mean and af is the variance. Since the value of the 
intrinsic quality Qi is in (1, to), thus T>{Qi) is obtained by truncating JV{qi, erf) to keep those values in (1, to) and 
scaling up the kept values by 1/Pr[l <X < m], where X is a random variable with probability distribution /^{qi, af). 
It should be clear that after truncation, the mean qi and the variance erf can still reflect the self-selection effect. Here, 
we use the following parameters to reflect four representative types of self-selectivity: 

High self-selectivity : the mean qi and variance erf are specified by 

gj=TO, crf = l, for i = 1, . . . ,iV. (1) 

This indicates that papers tend to have high intrinsic quality (or high mean), and most of the probability mass concen- 
trates around high intrinsic quality. Top tier conferences fall into this category. 

Medium self-selectivity: the mean qi and variance erf are 

= (m + l)/2, fjf = 1, fori = l,...,iV. (2) 

This reflects that papers tend to have an average intrinsic quality and most of the probability mass concentrates around 
the average intrinsic quality. Medium tier conferences fall into this category. 

Low self-selectivity: the mean qi and variance erf are 

= 1, af = l, forz = 1,...,7V. (3) 

This indicates that papers tend to have low intrinsic quality (or low mean), and most of the probability mass concen- 
trates around low quality. Low tier conferences fall into this category. 

^aadom self-selectivity: the variance erf is 

erf = 00, fori = l,...,Af. (4) 

So ^{Qi) converges to a uniform distribution on (1, m). This means that the intrinsic qualities of submitted papers are 
uniformly distributed. Newly started conferences fall into this category. This is because a newly started conference has 
not built up a reputation yet, thus researchers are not sure if it is a good conference, which results in random quality in 
submission. 
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2.2 Model for Reviewing Behavior 



When reviewing a paper, a reviewer needs to evaluate its quality. Here we assume that each reviewer is fair, unbias and 
critical. We consider two most important factors that affect the evaluation. The first one is Qi, the intrinsic quality of 
paper Pi, and the second one is the critical degree of the reviewer Specifically, when Qi is high, the evaluated quality 
of Pi is more likely to be high. And the higher the critical degree of the reviewer, the more likely that the evaluated 
quality tends to be close to the intrinsic quality of that paper The reviewing behavior can be described by a random 
variable and one can vary its mean and variance to reflect the intrinsic quality and critical degree. 

To illustrate, consider a paper Pi and one of its score Sy Recall that the reviewer who submits score 5* is denoted 
by TZ{Sj). Let c* G [0, 1] denote the critical degree of reviewer TZ{S'j). With the usual convention, higher value 
represents higher critical degree. The score Sj is a random variable with probability distribution 'D{Sj), which should 
have the following two properties: 

Property 1: The mean should be equal to Qi. The physical meaning is that a reviewer is unbias. 

Property 2: The variance should reflect the critical degree of a reviewer. Specifically, the higher the critical degree, 
the lower the variance for the probability distribution 

In our study, the probability distribution is obtained by mapping a normal distribution JV [Qi, 0-^(0*)) to 

a discrete distribution. Note that the standard variance o'(cj ) is a monotonic decreasing function of c* and we will 
specify it in later section. The probability distribution mapping can be described by the following two steps: 

Discretization : Transform a normal distribution into a discrete distribution. We transform the normal distribution 
Af {Qi, (T^(c* )) into a discrete random variable L with probability distribution 

A/" [Qi, (T^(Cj)) with values in {1, to}. The pmf of L is: 

p ^ Pr[^-0.5<X<^ + 0.5] ^ $ [ji + 0.5 - Q^)/<J[c))) - $ ((^ ^ 0.5 - Q,)/a[c))) 
^ Pr[0.5<X<TO + 0.5] $ ((to + 0.5 -g,)/fT(c})) -$((0.5- Q,)/f^(c})) ' 

for^=l,...,TO (5) 

where $(2;) = /^^ exp(-t^/2)/y27r(ii and the probability distribution of X is TV (Qj, (t^(c*)). Note that this 
discrete distribution satisfies Property 2 but not Property 1. In the following step, we adjust the distribution so that it 
satisfies Property 1 also. 

Adjustment: Adjust the distribution M [Qi, (T^(c* )) such that its mean equals to Qi. The idea is that if E[L] < Qi, 
then we increase the mean of JV [Qi, fT^(c* )) by scaling up the probability: 

Pr[L = ^], foran£= [Q,J +1,...,TO. 

Else, we decrease the mean by scaling them down. Applying this idea to adjust the mean of Af [Qi , (cj )) , we obtain 
the probability distribution!? (5]). The pmf of X'(S'j) is: 

Prf^^=£l = /[^"^(^^'''^-^]^'[^^^]' =!'•■•' W^J (6) 
' \[l + a{Q,,c])]Pr[L = e], W£ ^ IQ,\ + 1, . . . ,m ' 



where a{Qi,A) and /3((5i, c* ) are: 



ElS^Pr[L = £](i?[L]-£) 

and i is a discrete random variable with probability distribution A/" (Qi, cr^(Cj)). Note that ^[Sj) satisfies both 
Property 1 and 2. 
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2.3 Model for Critical Degree 



To model the critical degree of a reviewer, we classify papers and reviewers into ''types". Specifically, a paper can 
be of many types (e.g., system paper, theory paper, etc), and reviewers can be of many types also (e.g., prefer system 
paper, or theory paper, etc). If a paper-reviewer pairing is of the same type, then the expertise level and the critical 
degree of the reviewer will be of high values, else they will be of low values. 

To illustrate, consider a paper P gV and a reviewer R^TZ. Assume reviewer R reviews paper P. Let u S [0, 1] 
denote the matching degree between reviewer R and paper P and let e, c denote the corresponding expertise level 
and critical degree respectively. Using our usual convention, higher value represents higher matching degree. The 
matching degree couples the expertise level and the critical degree in the following manner: 

e = K, if /i e [(k — 1)//, k/Z), for K = 1, . . . , 

c = /(/i)G[0,l], where Me [0,1]. (9) 

Note that e = I when /i = 1 and / (/i) is a monotonic increasing function of /i. There are number of choices for function 
f{fi), e.g., /(/i) = fi, or f{fi) = fj,^, etc. We will specify it in later section. 

Again there are number of interesting questions to explore, i.e., is the conference recommendation system sensitive 
to the paper-reviewer matching? Will a small percentage of reviewers who prefer theory create a large inaccuracy in 
the final recommendation of a system-oriented conference (or vice versa)? 

3 Theoretical Analysis 

Recall that (k) and A{k) denote the set of k accepted papers according to the intrinsic quality or according to 
the conference recommendation criteria respectively. In this section, we first derive the following probability mass 
function (pmf) : 

Pt[\A^ (k) n A{k)\ =i], fori = 0,1,..., fc. 

With this pmf, we can then derive the expectation E[\A^{k) D A{k)\] and variance Var[|^^(fc) n ^(fc)|]. The above 
probability measures can provide us with a lot of insights, e.g., if Pr[|^^(fc)n^(fc)| = fc] occurs with a high probability, 
or E[\A^{k) n .4(fc)|] « k, then the conference recommendation system is very accurate and robust against different 
human factors, or if Var[|^^ (fc) n .4(fc) |] is of small value, then the conference recommendation system is very stable 
likely to be close to the expectation. To derive this pmf, let us first consider the following special case. The purpose is 
to show the general idea of derivation and to illustrate the underlying computational complexity. We will consider the 
derivation of the general case later 

3.1 Derivation of the Special Case 

Let us consider a conference recommendation system which has only one type of papers and one type of reviewers 
(e.g., all papers are theoretical and all reviewers prefer theoretical topics). Hence, the critical degree of all reviewers 
are the same, say c. The intrinsic quality of each paper is specified as follows: 

Q,^m-i{m-l)/{N + 1), for i = 1, . . . , iV. (10) 

Each paper will have the same number of reviews, or = n, Vi. The voting rule V is the average score rule and we 
use a random rule to tie-break any papers whose scores are the same. 

3.1.1 Theoretical derivation 

The score set for paper Pi is {SI, . . . , 5^}. Recall that a score is described by a random variable, and its probabil- 
ity distribution is uniquely determined by the intrinsic quality of the corresponding paper and critical degree of the 
corresponding reviewer Since the critical degree of each reviewer is the same c, thus S*}, . . . , 5^ are i.i.d. random 
variables. By specifying Qi and Cj in Eq. (|6]l with Eq. ( fTOl i and c* = c respectively, we obtain the pmf of score S**. 
This is stated in the following lemma. 
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Lemma 1 The pmf of score Sj, for i~l,...,N, j~l,...,n, is 



• [1 - Pim - , c)]Pr[L = 1, . . . , [Qd 

Pr[S} =£] = <[l + a{m - ^-i^,c)]Pr[L = £], = [Qd + 1, . . . , ^ 
lO, otherwise 



(11) 



where L is a discrete random variable with probability distribution J\f (m. — i{m. — 1)/ {N + 1), cr^(c)) whose pmf is 
derived by Eq. Q, and a{m — i(m, — 1)/{N + 1), c), /3(m — i{m — 1)/{N + 1), c) are derived by Eq. a«c/ dS]) 
respectively. 

The average score of each paper is 7^ = J2]=i /"i Th^ probability mass function (pmf) of the average score 
of each paper is specified in the following lemma. 

Lemma 2 The pmf of the averages score ji^^i, is 



Pr 



7» = - 
n 



and its cumulative distribution function ( CDF) is 

£ 



= Ev" P'-lS}=^jl for all. 



Pr 



l^<- 

n 



= V TT Pr\S]=s.j], for all, 



n, . . . , nm, 



71, ... , nm. 



(12) 



(13) 



where Pr[S''j = Sj] is specified in Eq. ( 1771 ). 

Proof: Note that, the ratings of each paper are independent random variables. The distribution of each rating has been 
derived in Lemma[T] Since 7i =X]j=i /'^' '^1"^^ enumerating all the cases satisfying the condition J2j=i ~ ^' 
we could obtain the pmd of 7^, or 



Pr 



£ 



It = 



'^[E;,^'.'=^l-E„,.,.,n;U'"[sj->i- 



Similarly, by enumerating all the cases satisfying the condition J2^=i — we could obtain the CDF of 7^, or 



Pr 



£' 

7. < - 
n 



Pr 



which completes the proof. 



Based on the results derived in Lemma[T]and Lemma|2] the probability that a specific set of papers is accepted is 
stated as follows. 

Theorem 1 Let {Pi^ , ■ • ■ , 7-*;^ } be a set ofk papers. The probability that the accepted paper set equals to {Pi^ , • • ■ , Pi^ } 
is: 



Pr[A{k) = {P., , . . . , J] = ^ n Pr [7. < £/n] l[Pr [7, < {£ - l)/n]\ l[Pr [7, < {£ - l)/n] - 

£—n \iGX i^X / j^X 

-1 nm 



K.ei\Q 



E n (i-^'-[7. <^/"]) n Mij=^in] 

i=niei\T jeTuG 



(14) 



where X = {ii, . . . , ij.} is the index set of \Pi^ , ■ ■ ■ , Pi^ }, and X ~ {1, . . . , N}\I is the complement of X. And 
Pr[7i = £/ n] is specified in Eq. ( l72l ). and Pri'ji <£/n] is specified in Eq. ( [Tit . 
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Proof: Let V. = {P,, , . . . , Pi J. Let H = {Pi , . . . , Pn}\'H be the complemeiU of H. Let 7min('H) = min{7,, i € I] 
denote the minimum average score of paper set T-L. Let 7max('H) =max{7i, i e 1} denote the maximum average score 
of paper set H. The probabiUty that the accepted paper set equals to {P^^ , . . . , Pi^. } can be divided into the following 
three parts: 

Pr[^(fc) = {P,,,...,P,,}]^¥r[A{k)^n, ^^Un) < 7max(H)] + 

Tmin in)> T max 

Pv[Aik) ^ n, 7min m - 7max (W)] . (15) 

Let us derive these three terms one by one. 

According to our voting rule, the accepted paper set A{k) equals to 'H is conditioned on that 7,ni„ (H) is less 
than 7 max {U). Thus, 

Tmin {n)< {%)] = Pr[7min(H)< {n)]Px[A{k) ^ n I 

T min {H)< 

"T max 

= (16) 

According to our voting rule, the accepted paper set A{k) equals to H is 1 conditioned on 7,j,in {H) > 7max (H). 
Thus, 

PT[A{k) = n, (H)] -Pr[7n,i„(H)> Tniax 

{n)]PT[A{k)=n I {n)> 

= Pr[7min(H)>7max(W)]. (17) 

Note that -fi = J2^=i Sj/n,\/i. Since the scores Sp yi,j, are independent random variables, thus the average 
scores 71, . . . ,7Ar are also independent random variables. Based on this fact, we derive the analytical expression 

of Pr [7 min (H)>7tnax('H)] aS 

Pr[7mm(H) > 7max(H)] = 5Ifc„ t"^™" = ^/"l bmax(H) < i/n] 

= Erin,eiP'"[^'^^A^]-n,eiP'"[^^- (^-i)/'^]) n^eT^'t^^" - 1)/"]- ^^^^ 

The remaining task is to derive the last term of Eq. (IT5t . When 7mm (^) = 7max(^) occurs, tie breaking will 
be performed on the set of papers with the average score equal to 7min(^)- Let us provide some notations first. Let 
= {i I 7i = 7min('H), « G 1} be the index set of the papers that belong to set Ti. and with average scores equal to 
minimum average score of T-L. Let Q = {i | 7i = 7inin('H), i £ 1} be the index set of the papers that belong to set H 
and with average scores equal to the minimum average score of %. Thus, tie breaking will perform on the papers with 
index set FlJQ, from which only | papers will be selected for acceptance. By enumerating all possible tie breaking 
paper sets, we can divide the the last term of Eq. (flST l into the following form: 

Pr[^(A;) = n, 7mi„(H) = 7max(H)] = P""!-^ U I U g], (19) 

where Pr[J^ U Q] is the probability that tie breaking is performed on papers with index set iF\jQ, and Pr[J^| U Q] is 
the conditional probability that papers with index set F is selected for acceptance under the condition that tie breaking 
is performed on the papers with index set F VJQ. Since the tie-breaking rule is the random rule, under which we just 
randomly pick | papers, thus 

Pr[-F|-Fug]^("-^_^l^")". (20) 
Because the average scores 71, ... , are independent random variables, we can derive Pr[J^ U Q\ as: 

Px[F yjQ]= Pr[7, > 7 min {%) for alH e I \ J" ] x 
Pr[l^ = 7min {%) for alH G J" U ^ ] X 
Prb<7inin(H) foralHeI\g] 

nm 

= E n (1-Pr[7. <^H) n ^Al,=t/rA n Pr[7K<(^-l)/n]. (21) 
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Combining Eq. ( fT9] ) - ( 1211 ) we obtain 

Pr[^(fc)=H, 7mi„(H)=7max(W)]= ^ E(''^^|^') ^ 11 (l-Pr[7.<^H) 

J] Pr[7, J] Pr[7« < (22) 

Combining Eq. ([TSll - dTS]), and (|22li, we obtain Eq. (O. ■ 

To illustrate the analytical expression of Pr[^(fc) = {Pin • ■ • , ^ifc}], consider a simple example where n = 1, 
m = 2, N = i and fc = l. Table[T]shows the analytical expression Pr[^(l) = {fij}]. 





Pr[^(l) = 


mi 


Pr[7i = 2]Pr[72 = l]Pr[73 = 1] + Pr[7i = 2]Pr[72 = 2]Pr[73 = l]/2+ 
Pr[7i = 2]Pr[73 = 2]Pr[72 = l]/2 + Pr[7i = l]Pr[73 = l]Pr[72 = l]/3 
Pr[7i = 2]Pr[73 = 2]Pr[72 = 2]/3 


{P2} 


Pr[72 = 2]Pr[7i = l]Pr[73 = 1] + Pr[72 = 2]Pr[7i = 2]Pr[73 = l]/2+ 
Pr[72 = 2]Pr[73 = 2]Pr[7i = l]/2 + Pr[72 = IJPr^s = l]Pr[7i = l]/3 
Pr[72 = 2]Pr[73 = 2]Pr[7i = 2]/3 


mi 


Pr[73 = 2]Pr[72 = l]Pr[7i = 1] + Pr[73 = 2]Pr[72 = 2]Pr[7i = l]/2+ 
Pr[73 - 2]Pr[7i = 2]Pr[72 = l]/2 + Pr[73 = l]Pr[7i = l]Pr[72 = l]/3 
Pr[73 = 2]Pr[7i = 2]Pr[72 = 2]/3 



Table 1: Examples of the analytical expression of Pr[^(l) = {^ii}], where ii = 1, . . . , 3. 



Up to now, we have derived the probability of a specific set of accepted papers. Let us derive the pmf of a general 
set of k papers: 

¥v[\A\k) C^ A{k)\ ^i], fori = 0,1,- 
which is shown in the following theorem. 

Theorem 2 The pmfof\A'{k) n A{k) \ is: 

Pr[\A'{k) n A{k)\ = z] = Y.F^A'{k)^ Y.gc:A^(k)P>-W) = -^u g], 

\T\=i \g\ = k-i 

for all i ^ 0,1, ... ,k, (23) 

where A^ (k) ~ {Pi, . . . , Pn}/ A^ (k) is the complement of A^ (k) andPr[A{k) — J- U G] is given in Eq. ( U?! ). 

Proof: The paper set A{k) can be divided into two disjoint subsets of which one is A{k) n A^ {k) and the other 
one is A{k) n A-^{k). Note that we have derived the pmf that a specific set of accepted papers in Eq. (fT4l i. Then 
by enumerating subsets of A{k) with cardinality i and all the subsets of A^(k) with cardinality k — i we can obtain 
probability Pi[\A^{k) n Aik)\ = i], or 

Pr[|^^(fc) n Aik)\ =t]= Y^FCA'ik), T^gcATik). PMik) = -^u c?], 

\T\=i \g\=k-i 

for alH = 0, 1, . . . , fc, 

where Pr[A{k) = J" U ^] is given in Eq. ■ 
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i 


Pr[A' {1) n A{1) = i] 




Pr[7i = 


= 2]Pr[72 = l]Pr[73 = 1] + Pr[7i = 2]Pr[72 = 2]Pr[73 = 


= l]/2+ 


1 


Pr[7i = 


-- 2]Pr[73 = 2]Pr[72 = l]/2 + Pr[7i = l]Pr[73 = l]Pr[72 


= l]/3 






Pr[7i = 2]Pr[73 - 2]Pr[72 = 2]/3 






Pr[72 


= 2]Pr[7i = l]Pr[73 = 1] + Pr[73 = 2]Pr[72 = l]Pr[7i 


= 1] + 





Pr[72 = 


= 2]Pr[7i = 2]Pr[73 = l]/2 + Pr[73 = 2]Pr[7i = 2]Pr[72 


= l]/2 




Pr[72 = 


2]Pr[73 = 2]Pr[7i = 1] + Pr[72 = l]Pr[73 = l]Pr[7i = 


1] X 2/3 






Pr[72 = 2]Pr[73 - 2]Pr[7i = 2] x 2/3 





Table 2: Examples of the analytical expression of Pr[^^(l) n .4(1) = i], where i = 0, 1. 



To illustrate the analytical expression of Pr[|.4^(fc) n A{k)\ = i], let us consider the same example with Table [T] 
Table|2]shows the analytical expression PT[A\l)r] Ail) ^i], where i = 0,1. 

Now we have derived the analytical expression of the pmf of \A^ (fc) n w4(fc) |, then it is easy to obtain the analytical 
expression of E[\A^{k) D A{k)\] and Var[|.4^(fc) n .4(fc)|] according to the definition of expectation and variance. 

Examining Eq. (|23] |. we can see that Pr[.4(fc) = U ^7] is an essential part of the analytical expression of the 
pmf. The analytical expression of PT[A{k) ~ J- U G] i& given in Eq. (fT4l i. which is quite complicated, and these 
analytical expressions cannot be reduced to a simple form. Thus, it is not easy to use these analytical results to gain 
some insight of a conference recommendation system. An alternative way is to compute the numerical results of the 
pmf of \A^{k) n A{k)\ based on the analytical expressions in Eq. (l23T l. After we obtain the numerical results of the 
pmf of \A^ (k) n .4(fc)|, we can then compute its expectation and variance. Unfortunately, computing the numerical 
results of Eq. (l23l l is computationally expensive, which is shown in the following theorem. 



Theorem 3 The computational complexity in calculating the numerical results of the pmf of \A^ {k)(^A{k) \ based on 
Eq. ( 1231 ) is exponential, or Q{2^). 



Proof: Examining Eq. (|23] |. we can see that the calculation of Pr[.4(fc) = U tj] is the core part on the calculation of 
the the pmf of \ A^ (k) n A{k) \ . Assume the running time of calculating Pr[.4(fc) = J- U Q] is t, then by Equation {23[ . 
the running time of calculating the pmf is 




t =e 



(24) 



In the following we analyze the running time of calculating the numerical result of PT[A{k) = J- U G] based on its 
analytical expression derived by Eq. (fl4] i. Examining Eq. (fl4] i. we can see that there are two basic computations of 
Eq. (O, of which the first one is 



npr 

\iei 



; < ■ 



npr 




1] 



< 



let us assume the running time of calculating this basic part is ti. The second basic computation is 



n (i-p^- 



n 



npr 



V. <■ 



'-1 



let us assume the running time of calculating this basic part is t2- The running time of computing Pi[A{k) ^ J- U G] 
by Eq. dill is 

t = e{npti + (2'^ - 1)(2^-'^ - l)npt2) 
^ e{npti + 2^-'^2"~^~'^npt2) 
= Q{npti + 2'^-^npt2) = e(2^npi2/4). 
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By letting t ~ 8(2^npi2/4) in Eq. (|24] |. we can obtain the result stated in this theorem. 



To illustrate the complexity, consider the number of mathematical operations (i.e., addition, subtraction, multipli- 
cation and division) that we need in the computation of the numerical results of the pmf of (k) n yl(fc)|. Table 
[3] illustrates the computational complexity of three conferences: Recsys' 11, Sigcomm' 11, WWW 1 1 and IEEE Info- 
com' 1 1 . Since the expectation and variance are based on these operations, so their computational complexity are also 
exponential. 



Conference 


N 


k 


Acceptance % 


Complexity 


ACM Recsys' 1 1 


110 


22 


20% 




ACM Sigcomm' 1 1 


223 


32 


13% 


^ 2312 


WWW' 11 


658 


81 


12% 




IEEE Infocom' 1 1 


1823 


291 


16% 





Table 3: Examples of computational complexity 



In summary, we have the following conclusions in analyzing the above conference recommendation system: 

• We can analytically derive the pmf of: 

Pt[\A^ (k) n A{k)\ fori = 0,1,..., fc, 

• The analytical expression is complex and it is not easy to obtain insights of the underlying recommendation 
system. 

• Computing the numerical results based on these analytical results is computational expensive. 

3.2 Derivation for the General Case 

For the general case, we can derive the analytical experssions of the pmf, expectation and variance of {A'' (k) n A{k)\ 
with similar methods used in the special case. The analytical expressions for the general case will have a similar form 
compared with the analytical expressions derived in the special case. Furthermore, it is reasonable to expect that for 
the general case, there can be different types of paper and that reviewers are not homogeneous (e.g, they may have 
different topics preference). Also, tie breaking rules will be more complicated than the random rule. Hence we expect 
the analytical expressions for the general case will be more complicated. Thus, one may not easily obtain insight by 
examining the analytical expression of the general case. Instead, let us focus on finding a practical approach to solve 
the general case, and that it should be computational inexpensive to obtain numerical results of: 

Pt[\A^ (k) n Aik)\ =i], fori = 0,l,...,fc. 

In the following section, we present this practical approach, and to show that not only we can have a computational 
efficient approach to compute all probability measures, but more importantly, provides performance guarantees on our 
operation. 

4 Randomized Algorithm 

In this section, we present a randomized algorithm to evaluate the pmf, expectation and variance of \A^ {k)r\A{k) \ . Our 
randomized algorithm is computationally efficient with performance guarantee. We will use the following notations 
to describe our algorithm. We define I{k) 1^1^ (fc) n A{k)\. Let Pr[/(fc) = i], E[I{k)] and Var[/(fc)] denote the 
approximate value of Pr[/(A:) ~ i], E[I{k)] and Var[/(A:)] respectively. We first present the algorithm, then show its 
performance guarantee. 
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4.1 Randomized Algorithm 



Our algorithm is stated in Algorithm 1. The main idea of this randomized algorithm is that we first approximate 
the pmf Pr[/(fc) = i], then we use the approximate value Pr[/(fc) = i] to compute E[I{k)] and Var[/(fc)]. We can 



Algorithm 1 : Randomized Algorithm 

1: for all i = 0, . . . , fc, 
2: for j = 1 to K do 

3: for alH = 1, . . . , A^, generate the intrinsic quahty for paper Pi by simulating the paper submission process 

according to a self-selectivity type. 
4: produce the intrinsic top-fc paper set (fc). 

5: assign papers to reviewers, generate the degree of criticality based on the paper-reviewer matching degree. 
6: for i = 1, . . . , N, generate score set for paper Pi by simulating the scoring process. 

7: simulate the decision making process, i.e., applying the voting rule V and the tie breaking rule T to produce the 

set A{k) based on the score sets {S^, . . . , S^} 
8: if the cardinality of the intersection of (k) and A{k) is equal to i, then £i ^ £i + 1. 
9: end for 

10: for alH = 0, . . . , fc, Pr[/(fc) ^ i] ^ i^/K 
U: E[I{k)]^J:l,^HI{k) = ^] 

12: Var[/(fc)] ^ Eto - E[I{k)]yPv[I{k) = ^] 



state two properties of this algorithm. The first one is its running time complexity and the other one is its theoretical 
performance guarantee. The following theorem states its running time complexity. 

Theorem 4 The computational complexity of our randomized algorithm is Q{KN log N), where K is the number of 
simulation round and N is the number of submitted papers. 

Proof: We prove this theorem by examining the complexity of each step of our Algorithm [T] The complexity of 
step 10 — 12 and 1 are the same, say Q{k). The complexity of step 3 — 8 are Q{N), Q{NlogN), 0(E^j^ n^), 
0(E»=i"O. 6(iVlogiV), and 9(1) respectively. Since each reviewer only review a small subset of the submitted 
papers, namely n; <C N, thus we have Q{J2iLi "-0 = Q{N). By adding the complexity of step 1 — 12 up we could 
obtain the theorem. I 

The remaining technical issue is how to set the parameter K. Specifically, how many simulation rounds K can 
produce a good approximation of the pmf, expectation and variance? Let us proceed to answer this question by 
deriving the theoretical performance guarantee for Algorithm[T] 

4.2 Theoretical Performance Guarantee 

First, we derive a loose bound on the number of simulation rounds K needed but have good performance guarantee. 
Then we show how one can have a tight bound on K, and tradeoff between K and its performance guarantee. The 
following theorem states the loose bound on K. 



Theorem 5 When the following condition holds: 



K > max 1=0,1, 



31n(2(A: + l)/<5) 



Pr[i(k)=i\^o Pr[I{k) ^ i]£^ 
then Algorithm\l\guarantees: 

Pr[I{k) = i] - Pr[I{k) = i] < ePr[I{k) = i], Mi 

E[I{k)]-E[I{k)]\ < eE[Iik)], 

Var[I{k)]-Var[I{k)] < e{l + e)Var[I{k)], 



(25) 
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with probability at least 1 ~ S. 

Proof: By applying Lemma H) we obtain that 

Pr[/(A:) ^i]- Pi[I{k) =i] < ePr[/(fc) = i], 
holds for all .t = 0, 1, . . . , fc, with probability at least 1 — (5. By applying Lemma|5]and|4]we have 

E[Iik)]-E[Iik)]\<eE[Iik)] 
holds with probability at least 1 — 5. By applying Lemma|2]and|4]we have 

Var[/(A:)] - Var[/(A:)] < e(l + e)Var[/(fc)] 
holds with probabihty at least 1 — 6. Please refer appendix for the proofs of these lemmas. I 

When one examines the Inequality dZSl ), one can see that the bound of K is useful when Pr[/(A:) = i] is not small 
for i = 0,1, . . . , k. Consider the case when Pr[/(fc) = i] < 2^ for some i = 0, 1, . . . , fc, then we have K > 2^. For 
such cases, K is too large. In the following theorem, we show a tight bound on K. 

Theorem 6 When K > 31ii(2(fc + l)/5)/e^, Algorithm\J}guarantees: 



Pr[I{k)=i]-Pr[I{k)=i] < e^JPr[I{k) ^ i], Vi = 0,...,fc, 



Var[I{k)] - Var[I{k)] 
with probability at least 1 — 5. 
Proof: By applying Lemma |9] we obtain that 

Pr[/(A:) = 



E\I{k)]-E[I{k)\ < e^k{k + l)E[Iik)]/2 

< e{k + 1) (^eVar[I{k)] + ^/{2k- 



i)Var[I{k)]/Q^ 



Pr[/(fc) 



< e^l>r[I{k) 



holds for alH = 0, 1, . . . , fc, with probability at least 1 — (5. By applying Lemma[TOl and|9]we have 



E[I{k)]- E[I{k)] < ey/k{k + l)E[I{k)]/2 
holds with probability 1 — (5. By applying Lemma[T2l andl9] we have 

Var[/(fc)] - Var[/(/c)] < e{k + 1) (eVar[/(fc)] + {2k + l)Ysa[I (k)] / 6^ 
hold with probabihty at least 1 — 6. Detailed proof of these lemmas are in the appendix. 



5 Evaluation of Peer Review System 

In this section we evaluate the accuracy of conference recommendation systems. We consider a conference with two 
hundred submissions, or = 200, and only A; = 30 submissions will be accepted. In other words, a 15% acceptance 
rate. In consistent with realistic conference review systems, we set to = 5, or the rating set is {1, . . . , 5}. The 
simulation rounds K in Algorithm[T]is set to 10^. In the following, we start our evaluation from a simple case, then we 
extend it step by step and evaluate the impact of various factors on the overall accuracy in the final recommendation. 
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5.1 Probability distribution, expectation and variance of \ A\k) fi ^(A;) | 



Here we consider a homogenous conference recommendation system, in which papers and reviewers are homogeneous 
and each paper is reviewed by the same number of reviewers, or rii = n, Vi. Specifically, each reviewer is non-biased 
with the same critical degree, or c* = c, Vi, j. Hence the function cr(c]), Vi, j, within Equation (|6]l has the same 
value (7(c) and further we set a{c) = 1. We select one type of self-selectivity, say medium self-selectivity specified by 
Equation (|2]i, to study here. The voting rule V is the average score rule and the tie breaking rule is the least variance 
rule that selects the paper with the least variance. If there is still tie, paper is randomly selected. 

Definition 1 The reviewing workload of a conference recommendation system is the sum of all reviews, or 

EN 
i=\ 



Definition 2 Let li be the random variable indicating 

I, = \A\i)r\A{k)\. (26) 

In other words, ifi ~ 1, Ii reflects the event that the best paper is accepted by this conference recommendation system. 
When i = 5, I5 reflects the event that the top five submitted papers are finally accepted. 

The numerical results of the pmf of (30) n ^(30) | are shown in Fig. [T] The numerical results of the expectation 
and variance of Ii, 7,5, Iiq and /30 are shown in Table |4] 

In Fig. [1] the horizontal axis represents the number of top 30 papers that got finally accepted, or |^^(30) n.4(30)|. 
The vertical axis shows the corresponding probability. From Fig. [T]we could see that when we increase the number 
of reviews per paper, or n, the probability mass function shifts toward the right. In other words, the more reviews 
each paper received, the higher the accuracy of the conference recommendation system. From Table |4] we have the 
following observations. When each paper is reviewed by three reviewers, approximately 19.8 papers from the top 
30 papers will get accepted. It is interesting to note that the chance of accepting the best paper is invariant of the 
reviewing workload, since E[Ii] =0.98 when n = 3 and it improves to 0.9999 when 10. This statement also holds 
for the top ten papers. From Table |4] we can see that as we increase n, we decrease the variance, which reflects that 
the conference recommendation system is more accurate. 

Lessons iearned: If reviewers are non-biased, fair and critical, and the quality of submitted papers is of medium self- 
selectivity as specified by Eq. we have a pretty accurate conference recommendation system. There are number of 
interesting questions to explore further, i.e., are these results dependent on the distribution of intrinsic quality, or the 
self-selectivity type of papers? Are these results sensitive to any voting rule? Let us continue to explore. 




|A'(30) n A(30)| 

Figure 1: pmf of |yt^(30)ny^(30)| when n = 3, 4, 6, 8, 10 and papers are submitted with medium self-selectivity. 
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71 = 3 


71 = 4 


n = 6 


71 = 8 


n = 10 




0.9832 


0.9931 


0.9986 


0.9996 


0.9999 


E[h] 


4.6854 


4.8284 


4.9352 


4.9723 


4.9869 


E[Iio] 


8.7763 


9.1643 


9.5576 


9.7446 


9.8426 


E[l30] 


19.8270 


20.960 


22.400 


23.328 


23.980 


Var[/i] 


0.0165 


0.0067 


0.0014 


0.0004 


0.0001 


Var[/5] 


0.2942 


0.1701 


0.0654 


0.0280 


0.0132 


Var[/io] 


1.0793 


0.7793 


0.4370 


0.2586 


0.1609 


Var[/3o] 


4.1210 


3.7710 


3.2934 


2.9580 


2.7121 



Table 4: Expectation and variance of /i, /5, /iq, and /30 when we increase number of reviews per paper, n = 
3, 4, 6, 8, 10, and papers are submitted with medium self-selectivity. 

5.2 Effect of Intrinsic Quality 

Here, we explore the effect of intrinsic quality (or self-selectivity) of papers on the conference recommendation sys- 
tem. Specifically, we consider four representative types of self-selectivity of papers: high, medium, low and random 
self-selectivity specified by Eq. (dJ — (|4|i respectively. We explore the effect of self-selectivity of papers on the ho- 
mogeneous conference recommendation system specified in Section ISTI with each paper receiving three reviews, or 
71 = 3. We use the following notations to present our results: 

H-S-S: high self-selectivity specified in Eq. ([T]i. 

M-S-S: medium self-selectivity specified in Eq. (|2]). 

L-S-S: low self-selectivity specified in Eq. (|3]l. 

R-S-S: random self-selectivity specified in Eq. (|4]i. 

The numerical results of the pmf of |^^(30) n^(30)| is shown in Fig.|2] The numerical results of the expectation and 
variance of Ii, I5, Iio and /30 are shown in Table |5] 

In Fig. Ill the horizontal axis represents the number of top 30 papers that got finally accepted, or |^^(30) n.4(30)|. 
The vertical axis shows the corresponding probability. From Fig. |2] we could see that as the self-selectivity type 
varies in the order of high, low, medium, random self-selectivity, the corresponding mass probability distribution 
curve moves towards right. In other words, the accuracy of the conference recommendation system corresponding 
to the random self-selectivity is the highest followed by medium, low and high self-selectivity. From Table |5] we 
have the following observations. When papers are submitted with medium, low or random self-selectivity, around 
20 papers from the top 30 papers will be accepted. It is interesting to note that the chance of accepting the best 
paper is invariant to these three self-selectivity types, since the corresponding three values of E[Ii] are all around 
0.98. This statement also holds for top ten papers. But when papers are submitted with high self-selectivity, the 
accuracy of the conference recommendation system is remarkably lower than that the other three self-selectivity types. 
Specifically, only a small number, around 13.2, of papers from the top 30 papers will be accepted. Even the best paper 
will get rejected with high probability, around 0.46. This statement also holds for top ten or top five papers. The 
variance corresponding to the high self-selectivity is the highest among those four, which reflects that the results of the 
conference recommendation system is the least accurate and more likely to depart from the expectation. Hence, for a 
prestigious conference, i.e., SIGCOMM, assume papers are submitted with high self-selectivity, and if the conference 
insists to have a small technical program committee with reviewers having moderate reviewing workload, say 71 = 3, 
the final recommendation may not be accurate. 

Lessons learned: When reviewers are non-biased fair and critical, the above simple conference review system is quite 
accurate except when papers are of high self-selectivity. In that case, one may explore other means to improve the 
accuracy. Again, there are number of interesting questions to explore, i.e., how large the reviewing workload do we 
need to have an accurate recommendation? 

5.3 Effect of Number of Reviews per Paper 

Here we consider a homogeneous conference recommendation system specified in Section ISTI We select one prob- 
ability measure, expectation of acceptance, to study. The numerical results of E[Ii], Ell^], E[Iiq] and E[l3o] are 
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|A'{30) n A(30)| 

Figure 2: pmf of 1^^(30) n ^(30) I when n = 3 and papers are submitted with high, medium, low or random self- 
selectivity. 





H-S-S 


M-S-S 


L-S-S 


R-S-S 


E[h] 


0.5401 


0.9832 


0.9788 


0.9846 


E[h] 


2.6250 


4.6950 


4.6082 


4.7599 


E[Iw] 


5.0687 


8.7736 


8.5162 


9.0810 


E[ha\ 


13.2258 


19.8270 


18.9798 


21.5821 


Var[/i] 


0.2437 


0.0165 


0.0208 


0.0151 


Var[/5] 


1.2236 


0.2942 


0.3698 


0.2314 


Var[/io] 


2.3933 


1.0793 


1.2491 


0.8275 


Var[/3o] 


5.4420 


4.1210 


4.2888 


3.5473 



Table 5: Expectation and Variance of Ii, 1^, /lo, and /30 when n = 3 and papers are submitted with high, medium, 
low or random self-selectivity. 

shown in Fig. [3] 

In Fig. [3] the horizontal axis represents the number of reviews per paper, or n. The vertical axis shows the 
corresponding expectation. From Fig. |3] we have the following observations. When we increase the number of 
reviews per paper, or n, the expectation increased, which reflects the improvement in accuracy of the conference 
recommendation system. As the self-selectivity type varies in the order of high, low, medium, random self-selectivity, 
the expectation curve shifts toward up. In other words, the accuracy corresponding to the random self-selectivity is the 
highest followed by medium, low, high self-selectivity. It is interesting to note that the best paper is invariant of the 
workload, except for papers with high self-selectivity. This statement also holds for the top five or ten papers. When 
papers are highly self-selective, the accuracy of the conference recommendation system is remarkably lower than the 
other three self-selectivity types. Especially, when the reviewing workload is low, say n = 3, with less than 15 papers 
from the top 30 papers will get accepted. This holds even for the best paper, which only has a probability of less than 
0.6 of being accepted when n = 3. Same can be said for the top five or top ten papers. In closing, for papers with 
high self-selectivity, and we may have to increase the reviewing workload to at least n>7 such that we have a strong 
guarantee that the best paper will be accepted. 

Lessons learned: If reviewers are non-biased and fair, using the above simple conference review system achieves 
relatively high accuracy except when papers are of high self-selectivity. In that case, we have to increase the workload 
to 71 > 7 to improve the system. Again, there are number of interesting questions to explore, i.e., is increasing the 
workload the only way to improve the conference recommendation system? Can we improve it by using different 
voting rule or tie breaking rule? 
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Figure 3: Expectation of Ii, I^, /lo, and /30 when we vary number of reviews per paper n = 2, . . . , 12, with papers are 
from high, low, medium or random self-selectivity. 



5.4 Effect of Voting Rules 

In this section we explore the effect of voting rules on the accuracy of conference recommendation system. Specifi- 
cally, we will evaluate the performance of the following three representative voting rules: 

Average score rule (Vas)' specified by 

Eliminate the highest & lowest score rule (Vehi)' eliminate the highest and lowest score of each paper, and calculate 
the average score of each paper by the remaining scores, or 



It 



Punish low scores rule (Vpi): punish the low scores. Specifically, a low score, or 1, brings the an extra punishment of 
decreasing its score by rj, or 

= Eses- ^/I'^'l " '^'^^ I ^ - 1, ^ e 5^1. (29) 
Let us set the punishment 77 to be 0.5 throughout this paper 

We use the least variance rule for tie breaking and we choose expectation as our performance measure. We 
evaluate the accuracy of these three voting rules on the conference recommendation system specified in Section ISTI 
The numerical results of expectation of /30 are shown in Fig. H] The numerical results of expectation of /i and for 



16 





Vas 




Vpi 


High self-selectivity 


E[h] 


0.5793 


0.5793 


0.5793 


E[h] 


2.8078 


2.8078 


2.8077 


E[Iw] 


5.4028 


5.4026 


5.4026 


Medium self-selectivity 


E[h] 


0.9832 


0.9911 


0.9832 


E[h\ 


4.6950 


4.7733 


4.6950 


E[Iw] 


8.7763 


8.9641 


8.7763 


Low self-selectivity 


E[h] 


0.9788 


0.9741 


0.9746 


E[h] 


4.6082 


4.5616 


4.5691 


i: lu. 


8.5162 


8.4032 


8.4297 


Random self-selectivity 


E[h] 


0.9846 


0.9873 


0.9846 


E[h] 


4.7599 


4.7937 


4.7599 


1.1,. 


9.0810 


9.1791 


9.0810 



Table 6: Expectation of E[Ii], £[1^], and £'[/io] for three voting rules: Vas, Vehi, and Vpi when n = 3 

high self-selectivity submissions are shown in Fig. |5] The numerical results of ^-i-^s], and i?[/io] when n = 3 

are shown in Table |6] 

In Fig. |4]and|5]the horizontal axis represents the number of reviews per paper, or n. The vertical axis shows 
corresponding expectation. From Fig. ID we have the following observations. When we increase n, the expectation 
E[l3o] corresponding to each voting rule increased, which reflects that the accuracy of the conference recommendation 
system increased. From Fig. |4(a)| we see that when submitted papers are of high self-selectivity, the expectation curves 
overlapped together In other words, for high self-selectivity papers, these three voting rules are similar From Fig. 
|4(b)| we see that for submitted papers with medium self-selectivity, the average score rule and punish low scores rule 
have the same degree of accuracy, and the eliminate highest and lowest score rule has slightly higher accuracy than 
the other two voting rules. This statement also holds for the random self-selectivity submissions as shown in Fig. |4(d)| 
From Fig. |4(c)[ we see that when submitted papers are of low self-selectivity, all three voting rules have nearly the 
same accuracy but the punish low scores rule has slightly lower accuracy. From Table |6] we observe that when n^S, 
the chance of accepting the best paper is invariant of these voting rules, unless when submitted papers are of high 
selectivity, since E[Ii\ is around 0.98 for medium, low and random self-selectivity papers. This statement also holds 
for the top five and the top ten papers. When papers are submitted with high self-selectivity, the chance of accepting 
the best paper is low, in fact, it is less than 0.6. The same statements holds for the top five and top ten papers. From 
Fig. |5] we see that when papers submitted with high self-selectivity, the expectation curves overlapped together And 
for each voting rule we have to increase the reviewing workload to at least seven such that we have a strong guarantee 
that the best paper or the top five papers will get accepted. 

Lessons learned: These three voting rules have comparable accuracy, no rule can outperform others remarkably. Thus 
the improvement of conference recommendation system by voting rules is limited. For each voting rule, we till have 
to increase the average workload to at least seven to have a strong guarantee that the best paper will get in. Again, 
there are number of interesting questions to explore, i.e., how about the tie-breaking rules? 

5.5 Effect of Tie Breaking Rules 

In this section we explore the effect of tie breaking rules on conference recommendation system. Specifically, we 
evaluate the performance of the following four representative tie breaking rules: 

Least variance (Tvar)-' select one with least variance, if there is still tie, perform random selection. 

Largest max score (Tmaxs)-' select one with the largest max score, if there is still tie, perform random selection. 

Largest min score (Tmins)-' select one with the largest min score, if there is still tie, perform random selection. 

Largest medium score (Tmeds): select one with the largest medium score, if there is still tie, perform random selection. 
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Figure 4: i?[/3o] for three voting rules: Vas, Vehi, and Vj 
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Figure 5: Expectation of E[Ii], E[Iq] for three voting rules: Vas, Vehi, and Vj,; for high self-selectivity submission. 
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Tvar 


Tmaxs 


Trnins 


Tmeds 


High self-selectivity 


E[h] 


0.5793 


0.5793 


0.5793 


0.5793 


E[h] 


2.8078 


2.8077 


2.8077 


2.8078 


E[Iw] 


5.4026 


5.4026 


5.4026 


5.4026 


Medium self-selectivity 


E[h] 


0.9832 


0.9880 


0.9832 


0.9907 


E[h] 


4.6951 


4.7514 


4.6951 


4.7755 




8.7764 


8.9311 


8.7765 


8.9831 


Low self-selectivity 




0.9788 


0.9787 


0.9788 


0.9886 


E[h] 


4.6083 


4.5989 


4.6074 


4.6040 


E[Iio] 


8.5164 


8.4723 


8.5102 


8.3990 


Random self-selectivity 




0.9846 


0.9858 


0.9846 


0.9872 




4.7599 


4.7747 


4.7599 


4.7926 


E[Iio] 


9.0810 


9.1239 


9.0810 


9.1761 



Table 7: Expectation of ^'[-fs], and -E[/io] for four tie breaking rules: Tvar, %naxs, %nins, and Tmeds when 

n = 3. 

To evaluate the performance of these four tie breaking rules, let us select a voting rule: average score rule and 
we use expectation as our performance measure. We evaluate the performance of these four tie breaking rules on the 
conference recommendation system specified in Section ISTI The numerical results of iSf/ao] are shown in Fig. |6] 
The numerical results of E[Ii] and Ell^] for high self-selectivity papers are shown in Fig. |7] The numerical results of 
E[Ii], Ell^], and E[Iio] when n = 3 are shown in Table |7] 

In Fig. |6]and|2l the horizontal axis represents the number of reviews per paper, or n. The vertical axis shows the 
corresponding expectation. From Fig. |6] we could have the following observations. When we increase the reviewing 
workload, the expectation i?[/3o] corresponding to each tie breaking rule increased, which reflects that the accuracy of 
the conference recommendation system increased. From Fig. |6(a)| and |6(c)[ we could see that when submitted papers 
are of high or low self-selectivity, the expectation curves corresponding to these four tie breaking rules overlapped 
together. In other words, these four rules have the same accuracy. From Fig. |6(b)| and |6(d)[ we see that when submitted 
papers are of medium or random self-selectivity, the expectation curves corresponding to these four tie breaking rules 
bunched together and the largest medium score rule has a slightly higher accuracy than others. From Table |7] we 
see that when n = 3, the chance of accepting the best paper is invariant of tie breaking rules, unless when submitted 
papers are of high self-selectivity, since values of E[Ii\ corresponding to medium, low or random self-selectivity are 
all around 0.98. This statement also holds for the top five and top ten papers. When submitted papers are of high 
self-selectivity, the chance of the best paper being accepted is low, or less than 0.6. The same statements holds for the 
top five and top ten papers. From Fig. |7] we see that for each tie breaking rule, we still have to increase the reviewing 
workload n to at least 7 so as to have a strong guarantee that the best paper or the top five papers will get accepted. 

Lessons learned: These four tie breaking rules have comparable accuracy, no rule can outperform others remarkably. 
Thus, the improvement of conference recommendation system by tie breaking rules is limited. For each tie breaking 
rule, we till have to increase the reviewing workload n to at least seven to have a strong guarantee that the best paper 
will get in. 

5.6 Effect of Reviewers Type - two types case 

We extend the homogeneous conference recommendation system specified in Section ISTI to a heterogeneous confer- 
ence recommendation system, for which papers and reviewers can be of different types. Specifically, we consider 
papers and reviewers are of two types (e.g. system or theory). If paper-reviewer matches in the same type, the critical 
degree is high, else the critical degree is low. Here we explore the effect of the fraction that paper-reviewer matches in 
the same type on the accuracy of the recommendation. We use expectation as our performance measure and we set the 
number of review per paper in this heterogeneous recommendation system be four, or n = 4. We vary the fraction of 
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Figure 6: Expectation of i?[/3o] for four tie breaking rules: Tvar, Tmaxs, %nins, and Tm 
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Figure 7: Expectation of E[Ic,] for four tie breaking rules: Tvar, Tm 
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paper-reviewer that matches in the same type from 0.1 to 1. Before showing our results, let us specify o'(cp derived 
in Eq. ©I if papers-reviewer matches in the same type, cr(cj) = 0.5, otherwise cr(c* ) = 2. The numerical results of 
E[Ii\, E[h], E[Iw], and E[ho\ are shown in Fig. E] 

In Fig. [8j the horizontal axis represents the the fraction that paper-reviewer matches in the same type. The vertical 
axis shows the corresponding expectation. From Fig. |8] we have the following observations. When we increase the 
matching fraction, the expectation E[Ii\, E\Ii^], £^[/io], and i?[/3o] increased. In other words, the accuracy of the 
conference recommendation system increased when more paper-reviewer matches in the same type. It is interesting to 
see that the expectation E[Ii\, E[Ir-,], E[Iiq\ and -E[/3o] increase in a linear rate and the rate corresponding to the high 
self-selectivity is the highest, and the random self-selectivity is the lowest. When papers are submitted with medium, 
low or random self-selectivity, the chance of accepting the best paper is invariant of the matching fraction, since the 
corresponding values of E[Ii\ are approximately equal to 1 when the matching fraction is and increased slightly 
when matching faction increased to 1. This statement also holds for the top five papers. When submitted papers are 
highly self-selective, the accuracy of the system is very sensitive to the matching fraction, this is because w 0.5, 
^'[-^lo] ~ 5 and E[I^^] « 13 when matching fraction is 0, and E[Ii\ « 0.9, -E[/io] ~ 8 and -E[/3o] ~ 20 when matching 
fraction is 1 . 

Lessons learned: If reviewers' preference and papers match, it can significantly affect the accuracy of the conference 
recommendation system. This is especially true for submitted papers which are of high self-selectivity (or prestigious 
conferences). To achieve this, it is especially important to select the appropriate technical program committee members 
to match the research topics since it can greatly influence the accuracy of the final recommendation. 
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Figure 8: Impact of paper-reviewing matching 
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5.7 Effect of Reviewers type - many types case 

We now generalize the types of papers or reviewers of a conference recommendation system specified in Section |576l 
to many types. We explore the effect of reviewer types on the accuracy of different voting rules. Specifically, we 
consider the following four representative voting rules: Average score (Vas), Eliminate the highest & lowest (Vehi) > 
Punish low scores (Vpi) as given by Equation ( |27] i — ( |29] l. We also consider the Weighted average (Vwa) rule: The 
average score of paper Pi is weighted on ej, where j = 1, . . . , n;, the declared expertise level of reviewer TZ{S'j) who 
selects topics related to Pi, or 



We consider the least variance rule for tie breaking and we use expectation as our performance measure. In this 
evaluation, papers and reviewers are randomly matched. Before showing our results, let us specify two functions that 
related to model for reviewers types: the first one is cr(c* ) within the probability distribution for score Sj derived in 
Eq. (|6]), which is specified by the following linear function 



the second one is a monotonic increasing function /(/i) within Eq. (|9]l, which is specified by the following linear 
function 



In Fig. |9] the horizontal axis represents the number of reviews per paper, or n. The vertical axis shows the 
corresponding expectation. From Fig. |9] we have the following observations. When submitted papers are of high 
self-selectivity , the expectation curves corresponding to these four voting rules overlapped together In other words, 
these four rules have the same accuracy for high self-selectivity submissions. When submitted papers of low self- 
selectivity, the punish low scores rule has the lowest accuracy than the other three voting rules which have nearly the 
same accuracy. When submitted papers of medium self-selectivity, the weighted average scoring rule and the eliminate 
highest and lowest score rule have nearly the same accuracy, and the weighted average scoring rule has slightly higher 
accuracy than the average scoring rule and punish low scores rule . This statement also holds for the submitted papers 
with random self-selectivity . 

Lessons learned: When papers and reviewers are of many types and papers and reviewers are randomly matched, 
weighted average score rule can have slightly higher accuracy than average score rule and punish low scores rule 
, and it has nearly the same accuracy with eliminate highest and lowest score rule. We have the following similar 
observations in Section l54l these four voting rules have comparable accuracy, no rule can outperform others, thus the 
improvement of conference recommendation system by voting rules is limited. Again, there are number of interesting 
questions to explore, i.e., is the accuracy sensitive to anomaly behavior? 

5.8 Effect of Anomaly Behavior - Random Scoring 

Let us explore the effect of an anomaly behavior on the accuracy of a conference recommendation system. Specifically, 
we consider one potential anomaly behavior: random scoring behavior, under which a misbehaving reviewer gives a 
random score to any paper she reviews regardless of the quality of that paper We vary the fraction of anomaly behavior 
from 0. 1 to 1 and we use expectation as our performance measure. We explore the effect of anomaly behavior on the 
conference recommendation system specified in Section ISTI with number of reviews per paper n = 4. The numerical 
results of E[Ii], E[h], E[Iw], and £:[/3o] are shown in Fig. [JO] 

In Fig. [TOl the horizontal axis represents the fraction of these misbehaving reviewers. The vertical axis shows 
the corresponding expectation. From Fig. [TO] we have the following observations. When we increase the fraction of 
anomaly reviewers, the expectation decreased. In other words, the more anomaly reviewers, the lower is the accuracy 
of the conference recommendation system. It is interesting to note that the accuracy of the conference recommendation 
system decreases in a nearly linear rate. From Fig. |10(a)) we see that for low self-selectivity papers, the chance of 
accepting the best paper can withstand a small fraction of misbehaving reviewers, say around 10%. This statement 
holds for the top five papers. When submitted papers are of high self-selectivity, around 40% of misbehaving reviewers 
can drastically disrupt the accuracy of a conference recommendation system. Because for this case, less than ten papers 
from the top 30 papers will get accepted, and less than four papers from the top ten papers will get accepted. Even 
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a(c}) =0.5 + 1.5[l-cj], 



/(m) = Ai- 

We set I to be 3, thus G {1,2, 3}. The numerical results of £^[/3o] are shown Fig. |9] 
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Figure 9: Expectation of /30 for four voting rules: Vas, Vehi, ^pi and V, 



the best paper can only be accepted with the probability of less than 0.4. When papers are submitted with medium, 
low or random self-selectivity, around 60% misbehaving reviewers can drastically disrupt the accuracy of a conference 
recommendation system. 

An interesting question is that which voting rule is more robust against this type of misbehaving reviewers? Here 
we evaluate three voting rules; average score rule, eliminate highest and lowest and punish low scores given by 
Equation ( |27] | — ( |29] l. We use expectation as our performance measure. The numerical results of -E[/3o] are shown in 
Fig.E] 

In Fig. [TTl the horizontal axis represents of the fraction of misbehaving reviewers. The vertical axis shows the 
corresponding expectation. From Fig. [TT] we have the following observations. When we increase the fraction of 
anomaly reviewers, the expectation decreased. From Fig. |1 l(c)[ we see that when submitted papers are of low self- 
selectivity, the expectation curves corresponding to these three voting rules overlapped together In other words, for 
the low self-selectivity papers, these three voting rules have the same robustness. From Fig. |1 l(b)[ |1 l(d)[ we see that 
when submitted papers are of medium or random self-selectivity, eliminate highest and lowest score rule are slightly 
more robust than the other two rules. From Fig. |1 l(a)j we observe that when submitted papers of high self -selectivity, 
these three voting rules have the same robustness when the fraction of anomaly reviewer is less than 30%, and when it 
is higher than 30%, eliminate highest and lowest voting rule are more robust than the other two rules. 

Lessons learned: Random anomaly behavior can significantly affect the accuracy of the conference recommendation 
system. This is especially true for submitted papers which are of high self-selectivity (or prestigious conferences). 
The conference recommendation system suffers significantly from this kind of anomaly behavior, say and 20% of 
misbehaving reviewers will reduce the probability of the best paper to be accepted to around 0.5. These four voting 
rules have comparable robustness, no rule can outperform others remarkably, thus to defend this kind of anomaly 
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behavior by voting rules may not be effective. Again, there are number of interesting questions to explore, i.e., how to 
improve the accuracy of the conference recommendation system? 
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Figure 10; Impact of random anomaly behavior when n~A. 



5.9 Effect of Anomaly Behavior - Bias Scoring 

Let us explore the effect of another potential anomaly behavior on the accuracy of a conference recommendation 
system. Specifically, we consider bias scoring behavior, under which a misbehaving reviewer gives a high score m 
to a paper if the evaluated quality is low, say less than three, otherwise gives a low score 1 to those papers whose 
evaluated quality is above three. We vary the fraction of anomaly behavior from to 0.3 and we use expectation as 
our performance measure. We explore the effect of this type of anomaly behavior on the conference recommendation 
system specified in Section lSTI with number of reviews per paper ti = 4. The numerical results of -£[^5], -B[Ao]j 

and i?[/3o] are shown in Fig. [12] 

In Fig. [12] the horizontal axis represents the fraction of these misbehaving reviewers. The vertical axis shows 
the corresponding expectation. From Fig. [12] we have the following observations. When we increase the fraction 
of anomaly reviewers slightly, the expectation decreased remarkably. From Fig. |12(a)| we see that for low self- 
selectivity papers, the chance of accepting the best paper can withstand a small fraction of misbehaving reviewers, 
say around 6%, but for the other three self-selectivity types, even a small fraction of misbehaving reviewers may lead 
to high inaccuracy. For example, when 6% of reviewers are misbehaving, the best paper only has less than 70% 
chance of being accepted for the highly self-selective paper submissions and it only has less than 80% chance of being 
accepted for the submitted papers which are of medium or random self-selectivity. Similar deterioration can be said 
for the top five papers. Around 15% of misbehaving reviewers can drastically disrupt the accuracy of a conference 
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Figure 11: Robustness of three voting rules: Vas, Vehi^ and Vpi when n = 4 



recommendation system, since less than 15 papers from top 30 papers will get accepted. 

Lessons learned: Bias scoring anomaly behavior can significantly affect the accuracy of the conference recommen- 
dation system. A small fraction of this kind of misbehaving reviewers can decrease the accuracy of the conference 
recommendation system dramatically. This is especially true for submitted papers which are of high, medium, or 
random self- selectivity (prestigious, medium, or newly started conferences). The conference recommendation system 
suffers severely from this kind of anomaly behavior, say 15% of misbehaving reviewers will disrupt the accuracy of 
the conference recommendation system. 

5.10 Improving Conference Recommendation Systems 

In reality, most conferences use homogeneous review strategy, with which each paper is reviewed by the same number 
of reviewers. One obvious advantage of this strategy is its fairness for all papers. But its efficiency in using the 
workload is low. Here we propose a heterogeneous review strategy that that can increase the efficiency of homogeneous 
review strategy. In other words, with the same reviewing workload W, it has a much higher chance to finally include 
the top k papers in the final acceptance recommendation. Assume that the the total reviewing workload for the 
homogeneous review strategy W = N * n. Our heterogeneous review strategy works in two rounds: 

Round 1: Eliminate half of the submitted papers using only half of the workload. Specifically, each paper will receive 
[n/2j reviews in the first round. After this reviewing round, apply a voting rule and tie breaking rule to eliminate N/2 
papers. 

Round 2: Select k papers from the remaining N/2 papers to accept. Each paper entering this round will receive 
2[n/2] reviews. After the reviewing process finished, combine their reviews in round 1 and round 2. Then apply a 
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Figure 12: Impact of bias scoring anomaly behavior when 7i = 4. 



voting rule and a tie breaking rule to select the top k papers to accept. 



Definition 3 Let E[Ii \ horn] and E[Ii \ hetero] represent the expectation of Ii under homogeneous or heterogeneous 
review strategy applied respectively, where Ij is defined by Definition^ 

Definition 4 The improvement of heterogeneous review strategy over homogeneous review strategy is: 

A£'[/,] =E[U\ hetero] - E[I, \ hom], 

and the improvement ratio is: 

l^E[h]/E[h\hom]. 

We evaluate these two strategies on a conference recommendation system specified in Section lSTI The numerical 
results of A£'[/3o] and Ai5[/3o]/-E[/3o | hom] are shown in Fig. [13] where the horizontal axis represents the average 
reviewing workload n. The vertical axis shows the corresponding improvement or improvement ratio. From Fig. [13] 
we see an improvement of heterogeneous review strategy over homogeneous review strategy. When the reviewing 
workload is small, say n = 3, with heterogeneous review strategy at least one more paper from the top 30 papers will 
get accepted. For papers with high self-selectivity, we have more improvement wherein three or more papers from the 
top 30 papers will get accepted. When the average reviewing workload increased to six, the improvement becomes 
stabilized. When the reviewing workload is three, the improvement is the highest, or around four more papers from 
the top 30 papers will get accepted for papers submitted with high self-selectivity, and the improvement ratio for this 
case is around 30%. 

An interesting question is that with heterogeneous review strategy, how large the average reviewing workload do 
we need. We apply our heterogeneous strategy to the conference recommendation system specified in Section lOl We 
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use expectation as performance measure. The numerical results of E[Ii], E[I^], E[Iio] and -^[/ao] are shown in Fig. 
[I4l where the horizontal axis represents the average reviewing workload n. The vertical axis shows the corresponding 
expectation. From Fig. [141 we see that we need to increase the workload to at least five such that we have a strong 
guarantee that the best paper will get accepted, which reduces the average review workload by two as compared with 
the homogeneous review strategy, in which we need to increase the average reviewing workload to at least seven as 
stated in Section l53] 

Summary: Lessons learned: Our heterogeneous review strategy uses equal or less resource (e.g., reviewing work- 
load) than the homogeneous review strategy and at the same time, achieve higher accuracy. 




average workload (n) average workload (n) 

(a) Improvement (b) Improvement ratio 

Figure 13; Improvement of heterogeneous review strategy on homogeneous review strategy 



6 Related Work 

In ll8ll9l [T4ll2TI . authors studied peer review systems. Typically, the main issue is the reviewer assignment problem 
which contains three phrases: specifying the assignment constraint, computing the matching degree between reviewers 
and submissions, and optimizing the assignment with constraints. Disciplines like information retrieval 0, artificial 
intelligence lfmi2Tl and operations research |l5]|9][i4], etc address these assignment problems. 

Authors in ID [19] |24l |25l worked on the group recommendation systems and address issues on rating scale and 
in [15] O on preference aggregation. Rating is used to show individuals' preferences, and in liZSl . authors stated 
that discrete rating scales (number of rating points) outperform continuous rating scales. In authors evaluated 
the reliabihty of rating scales and showed evidence that more rating points will have a more reliable rating. In |fT9l , 
authors stated that the best rating scale is around five to ten rating points. Preference aggregation is the process to 
merge the preference of multiple people so as to make recommendations. Basically the aggregation method can be 
divided into two classes based on the preference type: cardinal ranking or ordinal ranking. For cardinal ranking case, 
weighted average strategyfiE\ is the most popular strategy, and it is used in PolyLen. The second class is the ordinal 
ranking preference, for which, each individual's preference is shown by a ranked list of a subset of the candidates. For 
this case, users' preferences are treated as a set of constraints and a preference aggregation approach attempts to find 
recommendations that satisfy the constraints of all usersll2ll6l [T5]| . As far as we know, our work is the first that study 
the mathematical modeling of competitive group recommendation systems and apply it to peer review systems. 

7 Conclusions 

This is the first paper that provides a mathematical model and analysis on a competitive group recommendation system. 
We apply it to a conference peer review system and show how various factors may influence the overall accuracy of 
the final recommendation. We formally analyze the model and through this analysis, we gain the insight on developing 
a randomized algorithm, which is both computationally efficient and can provide performance guarantees on various 
performance measures. Number of interesting observations are found, e.g., for a medium tier conference, three reviews 
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average workload (n) 

(a) # of top 1 papers get in 




2 4 6 8 10 12 
average workload (n) 



(c) # of top 10 papers get in 

Figure 14: Expectation of h, h, Iio 
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average workload (n) 

(b) # of top 5 papers get in 
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(d) # of top 30 papers get in 

/30 with heterogeneous review strategy. 



per paper are sufficient to achieve high accuracy in the final recommendation, but for some prestigious conferences, 
we need at least seven reviews per paper Lastly, we propose a heterogeneous review strategy that requires equal or 
less reviewing workload but can produce a more accurate recommendation than the homogeneous review strategy. 
We believe our model and methodology are important building blocks for researchers to study competitive group 
recommendation systems. 
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Appendix 



Theorem 7 (Chernoff Bound lll7l ) Let Xi , . . . , X„ be independent random variables with Xi = 1 with probability p 
and otherwise. Let X ~ X]r=i '^^'^ M — ^i-^] = ''^P- Then we have 

Pr[\X - ^1 > efi] < 26-^"'^/^. 



In the following lemma we derive a loose bound on the number of simulation rounds K but have a good perfor- 
mance guarantee on the pmd of I{k), or Pr[/(fc) = z], Vi by carefully applying Theorem|7] 



Lemma 3 When the following condition holds: 



K > max 



31n(2(fc + l)/,5) 



(31) 



then for each i = 0, 1, . . . , fc. 



Pr[I{k) = t] - Pr[I{k) = i] 
holds with probability at least 1 — S / {k + I). 



< ePr[I{k) 



Proof: Without lose of any generality, consider the performance guarantee on the approximation of Pr[/(A:) = i], 
where i ~ 0,1, . . . , k. Our goal is to show 

Pr[/(/fc) =i]- Pi[I{k) ^i] < ePr[/(fc) = i]. 

Let lij be an indicator random variable defined by 

1 if in j-th round, (k) n A{k) \ = i 
otherwise ' 

where j — 1, . . . ,K. There are two cases to explore: 

Case 1: Pr[/(fc) = i] = 0. The physical meaning implies that the event I{k) = i never happen, which result in = 
for all j = 0,1,..., K. Hence, Pr[/(fc) ^ i] = T.f=i hj/K = 0. Then we have 

Pr[J(fc) = i] - Pr[/(/s) ^i] < ePr[/(fc) = i\. 



Case 2: Pr[/(fc) = i] 0. Since each round runs independently, thus random variables In, ... , liK are independent 
random variables with I.y = 1 with probability Pr[/(fc) = i] and otherwise. Let 1^ = lij- Then = 

A'Pr[/(A:) = i\. From Algorithm [T] we could see that Pr[/(fc) = i], the approximate value of Pr[/(fc) = i], is given by 
Pr[/(/c) = i]=li/K. Then by applying Theorem|2]we have, 

Pr [|Pr[/(/c) = i] - Pr[/(A:) = i]| > ePr[/(A:) = i] 
= Pr - K¥r[I{k) = i]| > eATr[/(fc) = i]] 

by substituting K with Inequality dsTl ). we have 



Pr 



|Pr[/(A:) = i] - Pr[/(/s) = i]\ > ePr[/(/s) = i] 



<6/ik + l). 
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Finally the proof of this lemma can be completed by 



Pr |Pr[/(fc) = i]- Pr[/(fc) ^ i] \ < ePr[/(fc) = i] 

> 1 - Pr [|Pr[/(fc) = i] - Pr[/(fc) = i] \ > ePr[/(fc) = i] 
>l-S/{k + l). 

This lemma is proved. I 

Lemma[3]shows the performance guarantee on Pr[/(A:) = i] with success probability at least 1 — S/{k + 1) for 
each specific i ~ 0,1, . . . , k. Then, what is the success probability for all i = 0, 1, . . . , fc? The answer of this question 
is stated in the following lemma. 

Lemma 4 When the following condition holds: 

31n(2(fc + l)/J) 

then 

Pr[I{k) = i\- Pr[I{k) =i\ < ePr[I[k) = i], 
holds for all X = 0,1, ... ,k, with probability at least 1 — d. 

Proof: Let Ei denote the event that |Pr[/(fc) — i] — Pr[/(A:) = i]| < ePr[/(fc) — i] holds. From Lemma[3]we see that 
for each i = 0, 1, . . . , fc, the condition 

|Pr[/(/fc) = i] - Pv[I{k) =i]\< ePT[I{k) = i] 

holds with probability at least 1 — 5/{k + 1), thus 

Pr[£;,] > 1 - 5/{k + l). 

Our goal is to derive the probability that condition 

|Pr[/(/fc) = i] - Pr[/(/s) =i]\< ePr[/(/s) = i] 

holds for all i = 0, 1, . . . , fc. Specifically, the probability that events Eq, Ei, . . . , all happens, or 

PT[Eo,...,Ek]. 

Note that the physical meaning of Ei is that Pr[/(fc) = i], the approximate value of Pr[/(fc) = i], is close to the 
true value of Pr[/(A:) = i]. Thus the physical meaning of Di^jrEi, where T C {0, . . . , fc}, is that the approximate 
value of Pr[/(fc) = i], is close to the true value of Pr[/(A:) = i], for all i E J-. Thus Di^jrEi happens implies that 
^^g_^Pr[/(fc) = i], the approximate value of ^^i-^i^) " *] close to the real value of X^iej^ P'"!-^!^) = 

Since 

5]^^^_Pr[/(fc) = ^] J2,^^^'^nk) = ^] = J2,^Jr[Iik) = z] = 

where T ~ {0, 1, . . . , k}\J' is the complement of thus we have that J^ieJ^^'^^l-^i^) — *] close to the real value 
of Pr[/(fc) = i] if and only if J^ieT^A^W = *] close the real value of ^i^'^PT^[I{k) = i]. Thus Di^j-Ei 

implies that J2ieT^^l-^(^) ^ close the real value of = i]. 

The event Di^gEi, where Q C is more likely to happen given the prior information that J2iey^A-^{k) = i] 
is close the real value of J2ii£T^^[^ik) ~ *] than given noting at all. Since (li^jrEi implies that X^i£7=Pr[/(/c) = i] 
is close the real value of J2ieT^^\-^(^) ^ ^^^^ ^iegEi, where Q C J^, is more likely to happen given prior 
information that Di^jrEi happens than given nothing at all. Or mathematically, 

Pr[n,eei;,| n,e^ E,] > Pr[n,eg£;,], 
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where Q C T. Based on this fact, we have 

where J"^ = {0, . . . , fc} /{O, . . . , i}. By substituting Pv[E,] with Pr[£'j] > 1 - (5/(fc + 1) we have 

Pt[Eo, ...,Ek]>[l- 6/{k + >l-d, 
which completes the proof. ■ 

In the following lemma we derive the error bound of expectation, or [^^[/(fc)] — ii^[/(fc)] |, under the condition that 
|Pr[/(fc) = Pr[/(fc) =i]\< ePr[/(fc) = i] holds for alH = 0, 1, . . . , fc. 

Lemma 5 When the following condition holds: 

Pr[I{k) = t] - Pr[I{k) =i] < ePr[I{k) = i\, 



for all i = 0,1, ... ,k, then 



E[I{k)] - E[I{k)] < eE[I{k)] 



holds. 

Proof: The proof is quite straightforward, 

E[I(k)]-E[I(k)]\ 



Y,^^^t{Pr[I{k) = i]-Pr[I{k) = i] 
<Y.'^^^^MHk)^^]^eE[Iik)], 



which complets the proof. 



In the following, we let AE[I{k)] = E[Iik)] - E[I{k)] and Ap., = Pr[/(A:) = i] - Pr[/(fc) = i]. In the following 
we first derive the bound of (A_E[/(fc)])^, and then apply the bound of (A£'[/(fc)])^ to derive the bound of variance, 
or |Var[/(fc)] — Var[/(A:)]|. The bound of {AE[I{k)]y^ is stated in the following lemma. 

Lemma 6 When the following condition holds: 

Pr[I{k) = i] -Pr[I{k) = i] < ePr[I{k) = i\ 



for all i = 0,1, ... ,k, then 



holds. 



{AE[I{k)]f < e'^Var[I{k)] 



(32) 



Proof: First we have \Api\ < ePr[/(A:) ~ i]. It is straightforward to show J2i=o " 

eLo = eLo p'-t^^^-) = - eLo p'-t^^'^) = ^] 

= 0. 

Based on this fact, we could have 



(33) 
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Then by substituting \Api \ with \Api\ < ePr[/(A:) — i] we have 

iAE[i{k)]f < (eLo I* E[x]mm = ^] 

by applying Cauchy's InequaUty we have, 

iAE[I{k)]f < (yH^^ 62pr[/(fc) = (EL^* E[X])'Pr[I (k) = ^] 
= e'Wni[Iik)], 

which completes the proof. 

In the following lemma we apply Lemma|6]to derive the error bound of variance, or |Var[/(fc)] — Var[/(fc)] |. 
Lemma 7 When the following condition holds: 

Pr[I{k) = i] -Pr[I{k) = i] < ePr[I{k) = i] 

for all i = 0,1, ... ,k, then 

Var[I{k)] - Var[I{k)] < e(l + e)Var[I{k)] 

holds. 

Proof: First we can write Var[/(fc)] — Var[/(fc)] in the following form: 

Var[/(fc)] - Var[/(fc)] V i^Ap, - 2E[I{k)]AE[I{k)] - {AE[I{k)]f 

^ — ^z— 

^ — ^z— ^ — ^i— 

Since X]i=o ~ ^^^^ have 



Var[/(fc)] - Var[/(/c)] 



E.^ (^^E[I(k)]rAp,~iAE[I{k)]r 
< E' - E[Iik)]r\Ap,\ + iAE[I{k)]f, (34) 



by substituting Api with \Api\ < ePr[/(fc) = i], and substituting {AE[I (k)])'^ with Inequality ([32]) we have 



|Var[/(fc)] - Var[/(fc)]| < e(l + e)Var[/(A:)], 
which completes the proof. I 

In the following lemmas, we show how to tradeoff the number of simulation rounds K with the performance 
guarantees. The tradeoff between K and the performance guarantee on pmf is stated in the following theorem. 

Lemma 8 When the following condition holds: 

if > 3hi(2(A: + l)/(5)/e^ (35) 

then for each i ~ 0, . . . , k, 

Pr[I{k) =i\- Pr[I{k) = i\ < ey/Pr[I{k) = i], 
holds with probability at least 1 — 5/ {k + 1). 
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Proof: Without lose of any generality, consider the performance guarantee on the approximation of Pr[/(A:) = i], 
where i — 0,1, . . . , k. Our goal is to show 

Pr[/(fc) = i]- Pr[/(fc) =i] < e^/Pi[I{k) = i]. 

Following similar method in Lemma |3] and the same notations defined in the proof of Lemma [3] we could have the 
following: 

Case 1; Pr[/(fc) = i] = 0. ThenPr[/(fc) ^ i] = 0. Hence 



Pr[/(fc) = i] - Pr[/(fc) = i] < ey/Pr[I{k) = i] 

Case 2; Pr[/(fc) = i] 7^ 0. Following similar method in Lemma[3]we have 

Pr 



|Pr[/(fc) =i]- Pr[/(fc) =i]\< eVPrPW = ^ 



> 1 - Pr 
= 1 - Pr 



|Pr[/(fc) =i]- Pr[/(fc) =i]\> e^yPi[I{k) = i] 
eATr[/(fc) = i 



\h^KPr[I{k)^i]\ > 



v/Pr[/(fc) = i] 



by substituting K with InequaUty ( [35] ) we can prove this lemma. 

Using the same method in Lemma[3l we can prove the following lemma. 

Lemma 9 When K>3 ln(2(fc + l)/6)/e'^ , then 



Pr[I{k) ^i]- Pr[I{k) ^i] < t^JPr[I{k) = i] 

holds for all i ~ 0,1, . . . ,k with probability at least 1 — 5. 

In the following lemma we derive the error bound of expectation, or |£'[/(fc)] — £^[/(fc)] |, under the condition that 
|Pr[/(fc) = i]~ Pr[/(fc) =i]\< eyJPx[I{k) = i] holds for alH 0, 1, . . . , fc. 

Lemma 10 When the following condition holds: 



Pr[I{k) = i]- Pr[I{k) = i] < e^JPr[I{k) = i 



holds for all i = 0,1, ... ,k, then 



holds. 



E[I{k)] - E[I{k)] < ey/{k + l)kE[I{k)]/2 



Proof: First we have Api < e^Pr[/(fc) = i]. Then we have the following 

E[I{k)]-E[I{k)]\<Y,'^JAp,\ 

et^Pr[I{k) = t], 

by applying Cauchy's Inequality we have, 



E[I{k)] ' E[Iik)] < eyp,.o V p.=o ^ 
= e^{k + l)kE[I{k)]/2, 
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which completes the proof. 

In the following lemma we derive the bound of (A£'[/(fc)])^ under the condition that |Pr[/(fc) ~ i] — Pr[/(A:) 
i] I < ey/Pv[I{k) ^ i] holds for alH = 0, 1, . . . , fc. 

Lemma 11 When the following condition holds: 



Pr[I{k) = i\- Pr[I{k) = i] < e^JPr[I{k) = i] 
holds for all i = 0, 1, . . . , fc, then 

{AE[I{k)]f < e^{k + l)Var[I{k)] (36) 

holds. 



Proof: First we have Api < e-\/Pr[/(fc) = i]. From Inequality ( l33l l we have 

(Ai?[/(fc)]f < (^5]'^Jz-i?[/(fc)]||Ap. 

by substituting \Api \ with Api < e-y/Pr[/(fc) = i] we have 

iAE[Iik)]f < I* - E[I{k)]\ eVrniW^] 

by applying Cauchy's Inequality we have, 

{AE[iik)]r < (eLo^O (eLo(*"^[^^^)])'P'"[^(^) 

= e2(fc + l)Var[/(fc)], 

which completes the proof. I 

In the following lemma we apply Lemma [TTI to derive the error bound of variance, or |Var[/(fc)] — Var[/(fc)]|, 
under the condition that |Pr[/(fc) = - Pr[/(fc) = i] \ < e^Pr[/(fc) = i] holds for all z = 0, 1, . . . , fc. 

Lemma 12 When the following condition holds: 



Pr[I{k) = i\- Pr[I{k) =i] < e^/Pr[I{k) 



holds for all i = 0, 1, . . . , fc, then 



Var[I{k)] ~ Var[I{k)] < e(fc + 1) ^/{2k+l)Var [/(fc)]/6 + eVar [/(fc)] 



holds. 



Proof: First we have Api < e-\/Pr[/(fc) = i]. From Inequality ( |34] | we have 

Var[/(fc)] - Var[/(fc)]| < 5]'^^(z - i?[/(fc-)])2|Ap,|+ (Ai?[/(fc)])2, 



by substituting Api with \Api\ < ey^Pr[/(fc) = i] we have 

|Var[/(fc^)]-Var[/(fc^)]| <5]' U - E[Iik)])hVPr[I{k) = + (Ai?[/(fc)])^ 
by applying Cauchy's Inequality we have. 



|Var[/(fc)] - Var[/(fc)] | < 6 J2^J^ i?[/(fc)])2pr[/(fc) ^^]\ Y.,J' ^[^(^■)])' + (^^[^(^^)]) 



< e ((fc + l)v/(2fc + l)Var[/(fc")]/6j + {AE[Iik)])^, 
the proof can be completed by substituting {AE[X])'^ with Inequality i 
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